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Abstract 



We consider Glauber dynamics (starting from an extremal configuration) in a mono- 
tone spin system, and show that interjecting extra updates cannot increase the expected 
Hamming distance or the total variation distance to the stationary distribution. We de- 
duce that for monotone Markov random fields, when block dynamics contracts a Ham- 
ming metric, single-site dynamics mixes in O(nlogn) steps on an n- vertex graph. In 
particular, our result completes work of Kenyon, Mossel and Peres concerning Glauber 
dynamics for the Ising model on trees. Our approach also shows that on bipartite graphs, 
alternating updates systematically between odd and even vertices cannot improve the 
mixing time by more than a factor of logn compared to updates at uniform random 
locations on an n-vertex graph. Our result is especially effective in comparing block and 
single-site dynamics; it has already been used in works of Martinelli, Sinclair, Mossel, 
Sly, Ding, Lubetzky, and Peres in various combinations. 

1 Introduction 

In a number of cases, mixing rates have been determined for Glauber dynamics using block 
updates, but only rough estimates have been obtained for single site dynamics. Examples 
include the Ising model on trees and the monomer-dimer model on if- . In this work, we 
employ a "censoring lemma" for monotone systems to transport bounds for block dynamics 
to bounds for single site dynamics; sharp estimates result in several situations. 

Our main interest is in spin systems with nearest-neighbor interactions on a finite graph 
G. A configuration consists of a mapping a from the set V of sites of G to a fixed partially 
ordered set S of "spins". The probability 7r((T) of a configuration a is given by 
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where Z is the appropriate normahzing constant. More generally, our results apply when tt 
defines a monotone Markov random field. In single site Glauber dynamics, at each step, a 
uniformly random site is "updated" and assumes a new spin according to vr conditioned on the 
spins of its neighbors. The resulting Markov chain is irreducible, aperiodic, and has unique 
stationary distribution tt. Let ■) be the distribution of configurations after t steps, with 
initial state oj. Let — = | |/^(<^) ~ be the total variation distance. The mixing 
time Tcie) for the dynamics is the least t such that ■) — 7r|| < e for any u; G fi. In 

discrete-time block dynamics, a family B of "blocks" of sites is provided. At each step, a 
block i? G is selected uniformly at random and a configuration on B is selected according 
to 7r conditioned on the spins of the sites in the exterior boundary of 5. A useful method of 
bounding mixing times is to first bound the spectral gap of the block dynamics using path 
coupling, and then use comparison theorems for the spectral gap to derive a bound for TG(e). 
In key examples of Glauber dynamics for the Ising model on lattices and trees, this method 
tends to overestimate TQ[e) by a factor of n on an n- vertex graph. 
Stated informally, our main results are: 

• In Glauber dynamics for a monotone (i.e., attractive) spin system, started at the top or 
bottom state, censoring updates increases the distance from stationarity. 

• Suppose a monotone spin system on an n-vertex graph G has a block dynamics which 
contracts (on average) a Hamming metric, and single-site dynamics on each block with 
arbitrary boundary conditions mixes in a bounded time. If the collection of blocks 
can be partitioned into a bounded number of layers such that blocks in each layer are 
nonadjacent, and weights within a block have a bounded ratio, then discrete time single 
site dynamics on G mixes (in total variation) in 0(?7,log?7,) steps. 

• In [12] (see also [3]) it was proved that for the Ising model on an n-vertex fe-ary tree, 
block dynamics with large bounded blocks contracts a (weighted) Hamming metric at 
temperatures above the extremality threshold. This, in conjunction with our main re- 
sults, implies that single-site dynamics on these trees mixes in 0(n log n) steps. (See [19] 
for refinements of this theorem using Log-Sobolev inequalities). 

• If i7 is a subgraph of G and only one vertex in H is adjacent to vertices in G \ if , 
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then continuous-time Glauber dynamics on H mixes faster than the restriction to H of 
continuous-time Glauber dynamics on G. 

• For an n vertex bipartite graph, alternating updating of all the "odd" and all the "even" 
vertices cannot mix much faster than systematic updates (enumerating the vertices in 
an arbitrary order): The odd-even updates can reduce the number of vertices updated 
at most by a factor of two. Similarly, the odd-even updates can be faster than uniformly 
random updates by a factor of at most logn. 

See §1.2 for further discussion of block dynamics, and §2-3 for proofs. A preliminary version 
of our results, including the proof of Theorem 1.1, was presented in the 2005 lectures [23]. 

1.1 Terminology 

In what follows, a system {Q, S, V, vr) consists of a finite set S of spins, a set V of sites, a space 
Q C of configurations (assignments of spins to sites), and a distribution vr on Q, which 
will serve as the stationary distribution for our Glauber dynamics. We assume that 71(00) > 
for G fi. The Ising model (where S = {+, — } and Q = S^) is the basic example; we allow 
to be a strict subset of to account for "hard constraints" such as those imposed by the 
hard-core gas model. 

We denote by a^, the configuration obtained from cr by changing its value at v to s, that 
is, <J^{v) = s and cr^(n) = a{u) for all u ^ v. Let a* be the set of configurations {cr^jsgs in VL. 
The update /i^ at f of a distribution /i on is defined by 



For measures /i and z/ on a poset F, we write u < fi to indicate that /i stochastically 



The system (^2, S*, vr) is called monotone if S is totally ordered, is endowed with the 
coordinate-wise partial order, and whenever a, r G satisfy a < r, then for any vertex v & V 
we have 



/i^(o-) 




— /i((T*) for a eVL. 



(1) 



dominates that is, J g du < J g dfi for all increasing functions (7 : F — )■ M. 




(2) 



as distributions on the spin set S. 
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1.2 Main results 

Theorem 1.1. Let (fi, S, V, ir) be a monotone system and let fi be the distribution on Q which 
results from successive updates at sites Vi, . . . ,Vm, beginning at the top configuration. Let v 
be defined similarly but with updates only at a subsequence Vi^, . . . ,Vi^. Then ^ ^ v, and 
llAi "~ 7r|| < \\v — 7r|| in total variation. Moreover, this also holds if the sequence Vi, . . . ,Vm and 
its subsequence ii, . . . ,ik are chosen at random according to any prescribed distribution. 

See §2 for the proof, which shows also that the assumption of starting from the top config- 
uration can be replaced by the assumption that the dynamics starts at a distribution hq where 
the likelihood ratio fj^o/ir is weakly increasing. Other assumptions, in particular monotonicity 
of the system, cannot be dispensed with, as shown recently by Holroyd [11]. 

Next, we discuss block djTiamics and the contraction method to bound mixing times for 
spin systems. 

Let us endow Q C with the Hamming metric H{a,T) = \{v E V : a.^ r,y}. (More 
generally, it is sometimes fruitful to consider a weighted metric). The Kantorovich distance 
p(yU, v) between two distributions on f2 is defined to be the minimum over all couplings of /i 
and V of Eiy((T, r), where a is drawn from /i and r from v. The fact that this metric satisfies 
the triangle inequality is proved, e.g., in Chapter 14 of [13] and is essentially equivalent to the 
path-coupling Theorem of [4]. 

Given a subset B of V , let a'g be the set of configurations r G such that r agrees with a 
OYiV \ B. For (T G r2, the block update Ubct is a measure on cr^ defined by {Ub<j){uj) = -^^-^ 
for uj G cr^. Thus f/scr is tt conditioned on cr^. For a collection of blocks i3, the B-averaged 
block update of a G yields a random configuration with distribution -j^ YliBeB ^ • '^^^ 
block dynamics determined by B consists of performing successive S-averaged block updates. 

We say that a block dynamics is contracting if for any two configurations a and r, the 
expected number of discrepancies after a block update is smaller by a factor of 1— 7|i?|/|V^| or 
less, where 7 is a constant and is the number of sites in a block. The triangle inequality 
for the Kantorovich metric implies that it suffices to verify this contraction condition when a 
and r differ at a single site. In our setting, contraction implies a bound of order \V\ log \ V\ 
on the mixing time, since the number of blocks is of order \V\. When the blocks are cubes in 
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a lattice, a sufficient condition for contraction of block dynamics is strong spatial mixing, as 
defined and studied in [17, 15, 16, 9]. 

The system (^2, S*, V, vr) is a Markov random field if for any set B dV and a G fi, the 
distribution f/^cr depends only on the restriction of a to dB, the set of vertices in B'^ that are 
adjacent to B. 

The next theorem is intended to illustrate how, in a particular case. Theorem 1.1 can be 
used to deduce rapid mixing for single-site dynamics from a contraction condition for block 
dynamics. 

Theorem 1.2. Let Vl he the configuration space for a monotone Markov random field on the 
d-dimensional toroidal grid V = [0,A^— 1]"^. Let (^+1)|A^ and for each v G V, let B^, be the 
cube of side-length i anchored at v. If the corresponding block dynamics is contracting, and 
the single-site dynamics restricted to any block has uniformly bounded mixing time (for all 
boundary conditions), then single-site dynamics on all of V has mixing time 0(| V| log 
where the implied constant depends only on the contraction parameter 7 and on L 

Proof. For any u & V and any block 5, let 

= max p(UB(y,UB(y') 

where U bct is the distribution that results when B is updated from configuration a, and a' = 
is obtained from a by changing the spin at u to s. Since H{a,a') = 1, we have ^u{B) = 1 
when neither B nor dB contains u. If m G B, then $.u(-B) = 0, so the key case is when u is on 
the exterior boundary of B. 

Since the dynamics for updating a random block B is assumed to be contracting, we have 
in particular that for some constant 7, and any u & V, 

^i'/N <EA = P{B3u)-j^Yl '^"(^) = 77 - ^ E '^«(^) (3) 

dBBu dBBu 

where A is the decrease in Hamming distance between a and a' caused by the update. 

Let t be the number of single-site updates, performed uniformly at random on the sites 
inside a box B, needed (regardless of boundary spins) to bring the Kantorovich distance 
between the resulting configuration on B and the block-update configuration down to at most 
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5, where 6 = -fi'^/{4\dB\) = 7/(1 + {{i+2)/iY). We may assume t > £'^\og{i'^) so that 
virtually all of the sites in B actually get updated. Letting U^cr denote the distribution that 
results when t single-site updates are performed on B, we have that the consequent decrease 
in Hamming distance satisfies 



EAt = P(m is updated) - P^^*b^^ ^b^') 



dBBu 



N 



dB^u 

> (I — ^) 



Suppose next that T is a nonnegative-integer- valued random variable that satisfies P(T < 
t) < 5/1'^. Since the Hamming distance of any two configurations is bounded by if we 
perform T random single-site updates on the block 5, we get 

pd „ 

EAt > 7—7 - -4 V (4) 

dBBu 



so this "approximate block update" is still contracting. 

Suppose now that we choose j = (ji, . . . , jd) uniformly at random in {0, ... , and update 
(in the normal fashion) all the blocks Bj^^^_^^^j^ where k G Z"'. These blocks are disjoint, and, 
moreover, no block has an exterior neighbor belonging to another block, hence it makes no 
difference in what order the updates are made. We call this series of updates a "global block 
update," and claim that it is contracting — meaning, in this case, that a single global update 
reduces the Hamming distance between any two configurations a and r by a constant factor 

1-y. 

To see this, we reduce to the case where a and r differ only at a vertex u and average over 
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choice of j to get that the expected decrease in Hamming distance is 

which, by comparing with (3), exceeds 7 (£/(£+!)) . 

If the updates of the blocks Bj^^^^^^^j: are of the approximate variety as described above, 
we get an "approximate global block update" which still contracts. 

Let us now consider Glauber dynamics (successive updates of random single sites) for time 
2t\V\/i'^, with the object of showing that this will reduce the expected Hamming distance 
between any two configurations by at least a constant factor. The number of updates that 
hit a particular block B will then be a binomially distributed random variable T with mean 
2t; its probability of falling below t is bounded above by e~*/'^ (see, e.g., [1], Theorem A. 1.13, 
p. 312). Recall that we took t > i'^log{£'^)] if t < 4log{£'^/6) then we increase t to equal the 
larger right-hand side, and note that it is still depends only on 7, i and not on A^. We have 
thus ensured that P(T < t) < as required for (4). 

It follows that if we choose j uniformly at random as above and censor all updates of sites 
not in Ufc -^J+(£+i)fc5 then we have achieved an approximate global block update, and thus a 
contraction of expected Hamming distance by a factor 1 — 7/4. 

We deduce that 0(log|V|) approximate global block updates suffice to reduce the maxi- 
mal Kantorovich distance from its initial value \V\ (The Hamming distance between the top 
and bottom configurations) to any desired small constant. Recall that Kantorovich distance 
dominates total variation distance, and each approximate global block update involves 0(|\^|) 
single cite updates, with censoring of updates that fall on the (random) boundary. Thus with 
this censoring, uniformly random single-site updates mix in time 0(|V^| log |^|). 

By Theorem 1.1, censoring these updates cannot improve mixing time, hence the mixing 
time for standard single-site Glauber dynamics is again 0(|y| log |V^|). □ 

In the above theorem the periodic boundary and divisibility condition were assumed only 
for convenience in the proof, variations of which can be applied in many other settings. Indeed, 
since we announced our censoring inequality in 2001, other applications to block dynamics have 
been made by Martinelli and Sinclair [18], Martinelli and Toninelli [20], Mossel and Sly [21], 
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Ding, Lubetzky and Peres [5], and Ding and Peres [6]. In particular, [6] uses the censoring 
inequality to prove a uniform lower bound asymptotic to n\ogn/4 for the mixing time of 
Glauber dynamics of the Ising model on any n-vertex graph. 

Note that even if the Markov random field is not monotone, our proof shows mixing time 
0(|y| log \V\) for censored single-site dynamics; this improves by a log factor Corollary 3.3 of 
Van den Berg and Brouwer [2]. 

2 Proof of the censoring inequality (Theorem 1.1) 

Lemma 2.1. Let {Q, S,V,'it) be a monotone system, let fi any distribution on Q, and let /i„ 
be the result of updating fi at the site v ^V. If fi/ir is increasing on Q, then so is /it,/vr. 

Proof Define / : 5^ ^ M by 

f{a) := maxj^ : u e Q, u < (5) 

with the convention that /(cr) = if there is no a; G satisfying u < a. Then / is increasing 
on S*^, and / agrees with /i/tt on Q. 

Let cr < r be two configurations in Q; we wish to show that 

^(^) < ^^(r). (6) 

71 IT 

Note first that for any s E S, 

fiK) < fir:) , 

since / is increasing. Furthermore, /(r^) is an increasing function of s. Thus, by (1), 

where the last inequality follows from the stochastic domination guaranteed by monotonicity 
of the system. □ 

Lemma 2.2. Suppose that S is totally ordered. If a and f3 are probability distributions on S 
such that a/ (3 is increasing on S and (3{s) > for all s E S, then a y 13. 

8 



Proof. Let g be any increasing function on S; then, with all sums taken over s G S", 

confirming stochastic domination. The inequality in the chain is the positive correlations 
property of totally ordered sets (which goes back to Chebyshev, see [14] §11.2), applied to the 
increasing functions g and a/P on S with measure p. □ 

Lemma 2.3. Let {Q, S, V, n) be a monotone system. If fi is a distribution on Q such that fif-K 
is increasing, then fJ^ ^ fi^ for any v E V. 

Proof. Let g be increasing, li cr E Q satisfies fi{a*) > 0, then fi/fiv is increasing on a*. By 
Lemma 2.2 (applied to {s G S* : G Q} in place of S), for such a we have 

Multiplying by /i(o"') and summing over all choices of a' gives 

Yg{a)fi{a) > ^5(a)/i„(a) , 

establishing the required stochastic dominance. □ 

Lemma 2.4. Let {Q, S, V, vr) be a monotone system, and let /i, u be two arbitrary distributions 
on Q. If v/ti is increasing on Vt and v ^ ^, then — 7r|| < — 7r||. 

Proof. Let A = {a : z/(cr) > vr((T)}. Then the indicator of A is increasing, so 
\\u - Ti\\ = ^(z/(cr) - 7r(cr)) = u{A) - 7r{A) < 12(A) - 7r{A) , 

aeA 

since z/ ^ /i. The right-hand side is at most — 7r||. □ 

Theorem 2.5. Let 5, tt) be a monotone system. Let fi be the distribution on Q which 
results from successive updates at sites Ui, . . . , Uk, beginning at the top configuration. Let v be 
defined similarly but with the update at Uj left out. Then 

1. /i ^ z/, and 

2. — vr|| < \\y — 7r|| . 



Proof. Let fi^ be the distribution concentrated at the top configuration, and /i* = for 
i > 1. Applying Lemma 2.1 inductively, we have that each fi^/ir is increasing, for < i < /c. 
In particular, we see from Lemma 2.3 that fi^~^ >z {fij-i)uj = fJ'j- 

If we define z/* in the same manner as fii, except that = i^-'"^, then because stochastic 
dominance persists under updates, we have >z fJ^^ for all i; when i = k, we get /i ^ z/ as 
desired. 

For the second statement of the theorem, we merely apply Lemma 2.4, noting that u^/n 
is increasing by the same inductive argument used for /i. □ 

Proof of Theorem 1.1. Apply Theorem 2.5 inductively, censoring one site at a time. This 
establishes the case where the update locations are deterministic. In the case where the update 
sequence vi{^) . . . , Vm{C,) that yields fi is random (defined on some probability space (H, Ps)) 
and its subsequence leading to u is also random (defined on the same probability space), then 
conditioning on ^ yields measures and ul^) such that fi{^) ^ z/(,^) and u{^)/7t is increasing 
on Q. These properties are preserved under averaging over S, so we conclude that ^ ^ v and 
zz/vr is increasing on Vt. The inequality between total variation norms follows from Lemma 2.4. 

□ 

3 Comparison of single site update schemes 

In practice, updates on a system (fi, S*, V, tt) are often performed systematically rather than at 
random. Typically a permutation of V is fixed and sites are updated periodically in permuta- 
tion order. If the interaction graph is bipartite, it is possible and often convenient to update 
all odd sites simultaneously, then all even sites, and repeat; we call this alternating updates. 
To be fair, we count a full round of alternating updates as n single updates, so that alternating 
updates constitute a special case of systematic updating. 

Mixing time may differ from one update scheme to another; for example, if there are no 
interactions (so that one update per site produces perfect mixing) then systematic updating is 
faster by a factor of | logn than uniformly random updates, since after (^ — £:)r;,log?7, random 
updates about ny^/"^^^ sites have not been hit, so counting the number of sites that still have 
the initial spin implies the total variation distance to equilibrium is still close to 1. (For a 
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more general Q{nlogn) lower bound for Glauber dynamics with random updates see [10]). 

Embarrassingly, there are only a few results to support the observation that mixing times 
for the various update schemes never seem to differ by more than a factor of log n and rarely by 
more than a constant. (See [7, 8] for some recent progress in the Dobrushin uniqueness regime.) 
Theorem 1.1 allows us to obtain some useful comparison results for monotone systems, but is 
still well short of what is suspected to be true. 

Theorem 3.1. Let A be the alternating update scheme, and S an arbitrary systematic update 
scheme, for a bipartite monotone system {Q, S, V, ir) . Then the mixing time for S (starting at 
the top state) is no more than twice the mixing time for A. 

Proof. When updating according to S, we censor all even-site updates; on even passes, all 
odd-site updates. Since successive updates of sites of the same parity commute, the result is 
exactly A and an application of Theorem 1.1 shows that we mix at a cost of at most a factor 
of 2. □ 

Theorem 3.2. Let A be the alternating update scheme, and TZ the uniformly random update 
scheme, for a bipartite monotone system 5, V, n) . Then the mixing time for TZ (starting at 
the top state) is no more than 21ogn times the mixing time for A. 

Proof. When updating according to TZ, we censor all even-site updates until all odd sites are 
hit; then we censor all odd-site updates until all even sites are hit, and repeat. Since each 
of these steps takes 2{n/2)\og{n/2) updates on average. Theorem 1.1 guarantees a loss of at 
most a factor of 2 log n. □ 

Theorem 3.3. Let TZ be the uniformly random update scheme, and S an arbitrary systematic 
update scheme, for a monotone system {Q, S, V, vr) of maximum degree Ajnax- Then the mixing 



time for S is no more than 0(vAmax'^) times the mixing time for TZ. 

Proof. Prior to implementing a round of S, we choose uniformly random sites one by one as 
long as no two are adjacent; since the probability of adjacency for a random pair of sites is at 
most (Ajnax + l)/'^) this "birthday problem" procedure will keep about A/(?T./Amax) updates. 
All updates of sites not on this list are censored from the upcoming round of S, incurring a 
loss of a factor of ?^/( y^^/Amax)) = V Amax'^- Since updates of non-adjacent sites commute. 
Theorem 1.1 applies. □ 
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If {Q, S, V, it) is bipartite, then since the alternating scheme is a systematic scheme, Theo- 
rem 3.3 apphes to it as welL 

From systematic updates to alternating or random updates, there seems to be nothing 
better to do in our context than to score one update per systematic round, incurring a factor 
of n penalty. 

3.1 Hanging subgraphs 

Let if be a subgraph of the finite graph G, on which some system S, V, vr) is defined, and 
suppose what is wanted is mixing on H. When continuous-time Glauber dynamics is employed, 
it is natural to compare mixing time on H by itself (that is, with the rest of G destroyed) 
with mixing time Tq when all points of G are being updated. Indeed, for the Ising model 
(with no external field), we conjecture that Th never exceeds Tg\h — echoing a conjecture of 
the first author for spectral gaps, cited in [22] and proved there when G is a cycle. Putting it 
another way, we think bigger is slower. 

Because the Ising model is a Markov random field, and its stationary distribution on a 
single site is independent of the graph, it enjoys the following property: if only one vertex 
(say, x) of H is adjacent to vertices of G\H, then the stationary distribution on H is identical 
to the stationary distribution on G restricted to H. To see this, it suffices to note that either 
stationary distribution can be obtained by flipping a coin to determine the sign of x, then 
conditioning the rest of the configuration on the result. 

We can now make use of Theorem 1.1, together with monotonicity of the Ising model, to 
prove our conjecture in this limited case. 

Theorem 3.4. Let H be a subgraph of the finite graph G and suppose that at most one vertex 
of H is adjacent to vertices of G\H. Begin in the all state and fix a mixing tolerance e 
for continuous Glauber dynamics. Then Th{s) < Tg\h{£)- 

Proof. The result is of course trivial if H is disconnected from G \H; otherwise let x be the 
unique vertex of H with neighbors outside H. Let Q = {vi, . . . ,Vk) be the target sites of a 
sequence of updates on H, and let Q' on G be the result of replacing each update of x in Q 
by a block update of {x} U {G\H). Then, on account of the property noted above, the effects 
of Q and Q' are identical on H. 
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If it were not the case that Th{£) < Tg\h{^)i then there would in particular be an update 
sequence Q for H and a supersequence for G, all added sites being outside if, such that 
gets H closer by some 5 > to stationarity than does Q. However, for large enough j, 
we can replace the block updates in Q' by j single-site updates within {x} U {G \ H) to get 
a new update sequence Q" which contains Q'^, but whose resulting distribution matches that 
of Q' (thus also Q) to within total variation 6/2. This would force Q" to mix better than Q~^, 
contradicting Theorem 1.1. □ 
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